Skip to content

Lazily convert Prism/RBS locations to CodeRange objects#436

Open
sinsoku wants to merge 1 commit intoruby:masterfrom
sinsoku:lazy-convert-code-range
Open

Lazily convert Prism/RBS locations to CodeRange objects#436
sinsoku wants to merge 1 commit intoruby:masterfrom
sinsoku:lazy-convert-code-range

Conversation

@sinsoku
Copy link
Copy Markdown
Collaborator

@sinsoku sinsoku commented Apr 13, 2026

AST node initializers were eagerly calling code_range_from_node to
convert Prism::Location/RBS::Location into CodeRange, triggering
expensive UTF-16LE encoding via String#encode (9.0% of CPU samples
in tool/dog_bench.rb).

Store the raw location in @mid_code_range_loc / @cname_code_range_loc
and convert lazily on first access via ||=. In batch mode most values
are never accessed; in LSP mode the cost is deferred from file loading
to first interactive use (hover, go-to-definition, etc.).

Also remove mid_code_range from attrs hashes to avoid comparing
unconverted locations in Node#diff.

Measured with tool/dog_bench.rb (stackprof):

  • String#encode: 89 samples (9.0%) -> 19 samples (1.9%)

@mame
Copy link
Copy Markdown
Member

mame commented Apr 28, 2026

I'm hoping that switching to UTF-8 in #442 will significantly cut down the time spent in String#encode. The CodeRange object itself still gets allocated, though. I wonder if that allocation overhead might be neglectable?

AST node initializers were eagerly calling `code_range_from_node` to
convert `Prism::Location`/`RBS::Location` into `CodeRange`, triggering
expensive UTF-16LE encoding via `String#encode` (9.0% of CPU samples
in `tool/dog_bench.rb`).

Store the raw location in `@mid_code_range_loc` / `@cname_code_range_loc`
and convert lazily on first access via `||=`. In batch mode most values
are never accessed; in LSP mode the cost is deferred from file loading
to first interactive use (hover, go-to-definition, etc.).

Also remove `mid_code_range` from `attrs` hashes to avoid comparing
unconverted locations in `Node#diff`.

Measured with `tool/dog_bench.rb` (stackprof):
- `String#encode`: 89 samples (9.0%) -> 19 samples (1.9%)
@sinsoku sinsoku force-pushed the lazy-convert-code-range branch from 08c15e6 to 61d3805 Compare May 4, 2026 10:10
@sinsoku
Copy link
Copy Markdown
Collaborator Author

sinsoku commented May 4, 2026

Thanks for the review! I've rebased the branch onto master and force-pushed.

The default position_encoding is still UTF_16LE (env.rb:326), so String#encode is still called unless utf-8 is explicitly requested.

I ran tool/dog_bench.rb again on master (c824bb81, 5 runs each):

master this PR delta
Elapsed (avg of 5) 1.402 s 1.238 s −11.7%
String#encode 100 (8.4%) 29 (2.9%) −71 samples
GC total 463 (39.0%) 364 (35.9%) −99 samples
Total samples 1187 1015 −172

Two points:

  1. Even on the default UTF-16LE path, String#encode still takes 8.4%. Lazy conversion brings it down to 2.9%.
  2. The CodeRange allocation itself also matters — GC samples drop by ~99 (about 8% of total). This gain doesn't depend on position_encoding, so UTF-8 users benefit too.

So I think lazy conversion is still worth it, even after #442.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants